Here we examine whether publishing volume has an impact on overall, article, or place traffic, specifically whether total, average, or median traffic is affected by increased publishing volume.

TL;DR, it's somewhat inconclusive, but we're probably better off publishing about 6 places per day. As far as articles are concerned, we seem to perform better the more we publish, but only to a point. That points seems to be around 12 or 13 articles per day.



In [5]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline



In [6]:

    
df = pd.read_csv('All content.csv', index_col='Published',parse_dates=True)
df['count']=1
df = df[(df['Page Views'] > 200)]



In [7]:

    
df_resampled = df.resample('D',how ='sum')

Let's truncate the data to the period between June 1st, 2015 and March 4, 2016. This is to keep super old content out of the window while also eliminating any content less than 6 days old.



In [8]:

    
df_trunc = df_resampled.truncate(before='2015-06-01', after='2016-03-04')



In [9]:

    
df_trunc = df_trunc.dropna()



In [10]:

    
df_trunc = df_trunc[['Page Views', 'Social Actions', 'Social Referrals', 'Facebook Shares', 'count']]
df_trunc['mean']=df_trunc['Page Views']//df_trunc['count']

Here we plot the total page views, "PVs total", and the average page views based on how many pieces of content were published in a given day.

From this we see increasing returns for publishing more, but sparse data on the high end of the dataset.



In [11]:

    
df_trunc.plot(kind='scatter',x='count',y='Page Views',title='PVs total')
df_trunc.plot(kind='scatter',x='count',y='mean',title='Average PVs')
df_trunc.plot(kind='scatter',x='count',y='Facebook Shares',title='Total Facebook Shares')









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x1197ce350>



In [12]:

    
df2 = pd.read_csv('All content.csv',index_col='Published',parse_dates=True)

df_articles = df2[(df2['Url'].str.contains('/articles/',na=False))]
df_places = df2[(df2['Url'].str.contains('/places/',na=False))]
df_articles['count']=1
df_places['count']=1









    



/Users/Mike/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/Mike/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [13]:

    
df_articles_resampled = df_articles.resample('D',how='sum')
df_articles_trunc = df_articles_resampled.truncate(before='2015-06-01', after='2016-03-04')
df_articles_trunc = df_articles_trunc.dropna()
df_articles_trunc = df_articles_trunc[['Page Views', 'Social Actions', 'Social Referrals', 'Facebook Shares', 'count']]
df_articles_trunc['mean']=df_articles_trunc['Page Views']//df_articles_trunc['count']

Articles

Here we plot the average and total pageviews for articles based on number of articles published per day. It looks like there is improvement in performance for a while, but then there is a drop-off when publishing of articles exceeds 13 / day.

But it's not a very strong correlation



In [14]:

    
df_articles_trunc.plot(kind='scatter',x='count',y='Page Views',title='Articles PVs total')
df_articles_trunc.plot(kind='scatter',x='count',y='mean',title='Articles Average PVs')
df_articles_trunc.plot(kind='scatter',x='count',y='Facebook Shares',title='Total Facebook Shares')









    Out[14]:





<matplotlib.axes._subplots.AxesSubplot at 0x119766c90>



In [15]:

    
df_places_resampled = df_places.resample('D',how='sum')
df_places_trunc = df_places_resampled.truncate(before='2015-06-01', after='2016-03-04')
df_places_trunc = df_places_trunc.dropna()
df_places_trunc = df_places_trunc[['Page Views', 'Social Actions', 'Social Referrals', 'Facebook Shares', 'count']]
df_places_trunc['mean']=df_places_trunc['Page Views']//df_places_trunc['count']

Places

It appears that there is a very weak correction between total places published per day and either total or average Place performance.



In [16]:

    
df_places_trunc.plot(kind='scatter',x='count',y='Page Views',title='Places PVs total')
df_places_trunc.plot(kind='scatter',x='count',y='mean',title='Places Average PVs')
df_places_trunc.plot(kind='scatter',x='count',y='Facebook Shares',title='Total Facebook Shares')









    Out[16]:





<matplotlib.axes._subplots.AxesSubplot at 0x1199a4fd0>

Median Article Traffic



In [17]:

    
df_articles_resampled2 = df_articles.resample('D',how='median')
df_articles_trunc2 = df_articles_resampled2.truncate(before='2015-06-01', after='2016-03-04')
df_articles_trunc2 = df_articles_trunc2.dropna()
df_articles_trunc2 = df_articles_trunc2[['Page Views', 'Social Actions', 'Social Referrals', 'Facebook Shares']]
df_articles_trunc2['count']=df_articles_trunc['count']



In [18]:

    
df_articles_trunc2.plot(kind='scatter',x='count',y='Page Views',title='Median Articles PVs')









    Out[18]:





<matplotlib.axes._subplots.AxesSubplot at 0x11549ae90>

Again, there is nearly zero correlation between median article traffic per day and the volume of publishing

Let's look at Places, just to be sure



In [19]:

    
df_places_resampled2 = df_places.resample('D',how='median')
df_places_trunc2 = df_places_resampled2.truncate(before='2015-06-01', after='2016-03-04')
df_places_trunc2 = df_places_trunc2.dropna()
df_places_trunc2 = df_places_trunc2[['Page Views', 'Social Actions', 'Social Referrals', 'Facebook Shares']]
df_places_trunc2['count']=df_places_trunc['count']
df_places_trunc2.plot(kind='scatter',x='count',y='Page Views',title='Median Places PVs')









    Out[19]:





<matplotlib.axes._subplots.AxesSubplot at 0x11d3d5d90>

It looks like there is actually a weak relationship between median place traffic and overall publishing volume. It seems optimal to publish around 6 places per day.

import statsmodels.formula.api as smf



In [22]:

    
import statsmodels.formula.api as smf
df_trunc









    Out[22]:






  
    
      
      Page Views
      Social Actions
      Social Referrals
      Facebook Shares
      count
      mean
    
    
      Published
      
      
      
      
      
      
    
  
  
    
      2015-06-01
      16796
      602
      2700
      91
      7
      2399
    
    
      2015-06-02
      15121
      1596
      3360
      291
      8
      1890
    
    
      2015-06-03
      31562
      447
      1316
      70
      7
      4508
    
    
      2015-06-04
      8289
      630
      2170
      117
      7
      1184
    
    
      2015-06-05
      3070
      62
      181
      11
      7
      438
    
    
      2015-06-08
      12709
      1117
      1886
      193
      11
      1155
    
    
      2015-06-09
      8516
      755
      1206
      100
      10
      851
    
    
      2015-06-10
      9727
      232
      236
      29
      8
      1215
    
    
      2015-06-11
      16643
      1774
      5565
      315
      9
      1849
    
    
      2015-06-12
      11254
      1928
      4839
      409
      8
      1406
    
    
      2015-06-15
      17122
      3473
      7574
      549
      6
      2853
    
    
      2015-06-16
      3815
      174
      70
      19
      7
      545
    
    
      2015-06-17
      6870
      1045
      2058
      217
      9
      763
    
    
      2015-06-18
      139729
      19402
      82383
      3302
      9
      15525
    
    
      2015-06-19
      11084
      2900
      4758
      434
      7
      1583
    
    
      2015-06-22
      4926
      858
      665
      146
      8
      615
    
    
      2015-06-23
      7946
      1101
      1533
      91
      6
      1324
    
    
      2015-06-24
      19839
      2807
      5596
      435
      7
      2834
    
    
      2015-06-25
      12148
      989
      5419
      128
      9
      1349
    
    
      2015-06-26
      95266
      7429
      56449
      1306
      8
      11908
    
    
      2015-06-29
      6841
      263
      917
      53
      4
      1710
    
    
      2015-06-30
      77753
      6421
      54201
      1039
      9
      8639
    
    
      2015-07-01
      5562
      1096
      866
      233
      8
      695
    
    
      2015-07-02
      8516
      1233
      2027
      253
      9
      946
    
    
      2015-07-03
      7203
      788
      1292
      151
      4
      1800
    
    
      2015-07-06
      13168
      2210
      3257
      383
      8
      1646
    
    
      2015-07-07
      85092
      14057
      56996
      2308
      9
      9454
    
    
      2015-07-08
      11425
      1444
      4428
      291
      9
      1269
    
    
      2015-07-09
      9918
      2249
      2413
      413
      7
      1416
    
    
      2015-07-10
      10207
      1717
      2693
      289
      7
      1458
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2016-02-04
      76705
      6446
      27421
      882
      17
      4512
    
    
      2016-02-05
      116980
      7799
      50287
      1477
      17
      6881
    
    
      2016-02-06
      10747
      807
      3768
      142
      2
      5373
    
    
      2016-02-07
      4885
      530
      1349
      124
      2
      2442
    
    
      2016-02-08
      77578
      19993
      39979
      3232
      11
      7052
    
    
      2016-02-09
      93031
      18849
      36325
      2962
      17
      5472
    
    
      2016-02-10
      132885
      26072
      64903
      4289
      17
      7816
    
    
      2016-02-11
      111512
      15704
      54962
      2691
      17
      6559
    
    
      2016-02-12
      133127
      16497
      60692
      2908
      17
      7831
    
    
      2016-02-13
      6085
      429
      2433
      58
      2
      3042
    
    
      2016-02-14
      41886
      2836
      17159
      513
      3
      13962
    
    
      2016-02-15
      85001
      16350
      38497
      2379
      16
      5312
    
    
      2016-02-16
      223105
      24597
      147230
      4245
      18
      12394
    
    
      2016-02-17
      116709
      11023
      33455
      1998
      21
      5557
    
    
      2016-02-18
      107151
      19095
      40291
      2906
      20
      5357
    
    
      2016-02-19
      279126
      53864
      152526
      7301
      21
      13291
    
    
      2016-02-20
      28192
      4099
      12358
      647
      2
      14096
    
    
      2016-02-21
      9923
      793
      3761
      144
      2
      4961
    
    
      2016-02-22
      348047
      25172
      202742
      4153
      17
      20473
    
    
      2016-02-23
      67093
      14798
      30778
      1938
      17
      3946
    
    
      2016-02-24
      59589
      15410
      31228
      2346
      17
      3505
    
    
      2016-02-25
      182339
      48611
      110455
      7624
      20
      9116
    
    
      2016-02-26
      73464
      7573
      22560
      1305
      18
      4081
    
    
      2016-02-27
      5680
      498
      1900
      68
      2
      2840
    
    
      2016-02-28
      8983
      938
      3487
      143
      2
      4491
    
    
      2016-02-29
      76747
      20246
      39611
      2619
      19
      4039
    
    
      2016-03-01
      131868
      14143
      51002
      2448
      20
      6593
    
    
      2016-03-02
      69039
      12842
      31688
      2143
      17
      4061
    
    
      2016-03-03
      133869
      10515
      66790
      2030
      19
      7045
    
    
      2016-03-04
      131507
      34244
      57390
      5279
      20
      6575
    
  

218 rows × 6 columns



In [30]:

    
lm = smf.ols(formula="Page Views ~ count", data=df_trunc).fit()
lm.summary()









    



---------------------------------------------------------------------------
PatsyError                                Traceback (most recent call last)
<ipython-input-30-fee1528a1470> in <module>()
----> 1 lm = smf.ols(formula="['Page Views'] ~ count", data=df_trunc).fit()
      2 lm.summary()

/Users/Mike/anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc in from_formula(cls, formula, data, subset, *args, **kwargs)
    145         (endog, exog), missing_idx = handle_formula_data(data, None, formula,
    146                                                          depth=eval_env,
--> 147                                                          missing=missing)
    148         kwargs.update({'missing_idx': missing_idx,
    149                        'missing': missing})

/Users/Mike/anaconda/lib/python2.7/site-packages/statsmodels/formula/formulatools.pyc in handle_formula_data(Y, X, formula, depth, missing)
     63         if data_util._is_using_pandas(Y, None):
     64             result = dmatrices(formula, Y, depth, return_type='dataframe',
---> 65                                NA_action=na_action)
     66         else:
     67             result = dmatrices(formula, Y, depth, return_type='dataframe',

/Users/Mike/anaconda/lib/python2.7/site-packages/patsy/highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
    295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
    296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
--> 297                                       NA_action, return_type)
    298     if lhs.shape[1] == 0:
    299         raise PatsyError("model is missing required outcome variables")

/Users/Mike/anaconda/lib/python2.7/site-packages/patsy/highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
    154         return build_design_matrices(design_infos, data,
    155                                      NA_action=NA_action,
--> 156                                      return_type=return_type)
    157     else:
    158         # No builders, but maybe we can still get matrices

/Users/Mike/anaconda/lib/python2.7/site-packages/patsy/build.pyc in build_design_matrices(design_infos, data, NA_action, return_type, dtype)
    891                 name = factor_info.factor.name()
    892                 origin = factor_info.factor.origin
--> 893                 rows_checker.check(value.shape[0], name, origin)
    894                 if (have_pandas
    895                     and isinstance(value, (pandas.Series, pandas.DataFrame))):

/Users/Mike/anaconda/lib/python2.7/site-packages/patsy/build.pyc in check(self, seen_value, desc, origin)
    793                 # XX FIXME: this is a case where having discontiguous Origins
    794                 # would be useful...
--> 795                 raise PatsyError(msg, origin)
    796 
    797 def build_design_matrices(design_infos, data,

PatsyError: Number of rows mismatch between data argument and ['Page Views'] (218 versus 1)
    ['Page Views'] ~ count
    ^^^^^^^^^^^^^^



In [ ]:

	Page Views	Social Actions	Social Referrals	Facebook Shares	count	mean
Published
2015-06-01	16796	602	2700	91	7	2399
2015-06-02	15121	1596	3360	291	8	1890
2015-06-03	31562	447	1316	70	7	4508
2015-06-04	8289	630	2170	117	7	1184
2015-06-05	3070	62	181	11	7	438
2015-06-08	12709	1117	1886	193	11	1155
2015-06-09	8516	755	1206	100	10	851
2015-06-10	9727	232	236	29	8	1215
2015-06-11	16643	1774	5565	315	9	1849
2015-06-12	11254	1928	4839	409	8	1406
2015-06-15	17122	3473	7574	549	6	2853
2015-06-16	3815	174	70	19	7	545
2015-06-17	6870	1045	2058	217	9	763
2015-06-18	139729	19402	82383	3302	9	15525
2015-06-19	11084	2900	4758	434	7	1583
2015-06-22	4926	858	665	146	8	615
2015-06-23	7946	1101	1533	91	6	1324
2015-06-24	19839	2807	5596	435	7	2834
2015-06-25	12148	989	5419	128	9	1349
2015-06-26	95266	7429	56449	1306	8	11908
2015-06-29	6841	263	917	53	4	1710
2015-06-30	77753	6421	54201	1039	9	8639
2015-07-01	5562	1096	866	233	8	695
2015-07-02	8516	1233	2027	253	9	946
2015-07-03	7203	788	1292	151	4	1800
2015-07-06	13168	2210	3257	383	8	1646
2015-07-07	85092	14057	56996	2308	9	9454
2015-07-08	11425	1444	4428	291	9	1269
2015-07-09	9918	2249	2413	413	7	1416
2015-07-10	10207	1717	2693	289	7	1458
...	...	...	...	...	...	...
2016-02-04	76705	6446	27421	882	17	4512
2016-02-05	116980	7799	50287	1477	17	6881
2016-02-06	10747	807	3768	142	2	5373
2016-02-07	4885	530	1349	124	2	2442
2016-02-08	77578	19993	39979	3232	11	7052
2016-02-09	93031	18849	36325	2962	17	5472
2016-02-10	132885	26072	64903	4289	17	7816
2016-02-11	111512	15704	54962	2691	17	6559
2016-02-12	133127	16497	60692	2908	17	7831
2016-02-13	6085	429	2433	58	2	3042
2016-02-14	41886	2836	17159	513	3	13962
2016-02-15	85001	16350	38497	2379	16	5312
2016-02-16	223105	24597	147230	4245	18	12394
2016-02-17	116709	11023	33455	1998	21	5557
2016-02-18	107151	19095	40291	2906	20	5357
2016-02-19	279126	53864	152526	7301	21	13291
2016-02-20	28192	4099	12358	647	2	14096
2016-02-21	9923	793	3761	144	2	4961
2016-02-22	348047	25172	202742	4153	17	20473
2016-02-23	67093	14798	30778	1938	17	3946
2016-02-24	59589	15410	31228	2346	17	3505
2016-02-25	182339	48611	110455	7624	20	9116
2016-02-26	73464	7573	22560	1305	18	4081
2016-02-27	5680	498	1900	68	2	2840
2016-02-28	8983	938	3487	143	2	4491
2016-02-29	76747	20246	39611	2619	19	4039
2016-03-01	131868	14143	51002	2448	20	6593
2016-03-02	69039	12842	31688	2143	17	4061
2016-03-03	133869	10515	66790	2030	19	7045
2016-03-04	131507	34244	57390	5279	20	6575